Speech Technology for Unwritten Languages
نویسندگان
چکیده
منابع مشابه
Normalising Audio Transcriptions for Unwritten Languages
The task of documenting the world’s languages is a mainstream activity in linguistics which is yet to spill over into computational linguistics. We propose a new task of transcription normalisation as an algorithmic method for speeding up the process of transcribing audio sources, leading to text collections of usable quality. We report on the application of sentence and word alignment algorith...
متن کاملLarge-Scale Text Collection for Unwritten Languages
Existing methods for collecting texts from endangered languages are not creating the quantity of data that is needed for corpus studies and natural language processing tasks. This is because the process of transcribing and translating from audio recordings is too onerous. A more effective method, we argue, is to involve local speakers in the field location, using an audio-only translation inter...
متن کاملSpeech technology for minority languages: the case of Irish (gaelic)
The development of speech technology could play an important role in the maintenance and preservation of minority languages, especially where the population of native speakers are dwindling. This paper outlines the efforts within the WISPR project, to develop annotated spoken corpora along with some of the prerequisites for the synthesis of Irish (Gaelic). It details the particular challenges t...
متن کاملISCA SALTMIL SIG: speech and language technology for minority languages
This paper presents International Speech Communication Association (ISCA) Special Interest Group (SIG, http://www.isca-speech.org/sig.html) on Speech And Language Technology for MInority Languages (SALTMIL). The paper gives an overview of the group's mission and discusses its past and present activities.
متن کاملAcquisition of Translation Lexicons for Historically Unwritten Languages via Bridging Loanwords
With the advent of informal electronic communications such as social media, colloquial languages that were historically unwritten are being written for the first time in heavily code-switched environments. We present a method for inducing portions of translation lexicons through the use of expert knowledge in these settings where there are approximately zero resources available other than a lan...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE/ACM Transactions on Audio, Speech, and Language Processing
سال: 2020
ISSN: 2329-9290,2329-9304
DOI: 10.1109/taslp.2020.2973896